Efficient Polynomial-Time Nested Loop Fusion with Full Parallelism
نویسندگان
چکیده
Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when fusion-preventing dependencies exist in nested loops, or can not achieve good parallelism after fusion. This paper presents a significant addition to the current loop fusion techniques by presenting several efficient polynomial-time algorithms to solve these problems. These algorithms, based on multi-dimensional retiming, allow nested loop fusion even in the presence of outmost loop-carried dependencies or fusion-preventing dependencies. The multiple loops are modeled by a multi-dimensional loop dependence graph. The algorithms are applied to such a graph in order to perform the fusion and to obtain full parallelism in the innermost loop.
منابع مشابه
Polynomial-Time Nested Loop Fusion with Full Parallelism
Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when fusion-preventing dependences exist in nested loops, or can not achieve good parallelism after fu...
متن کاملParallélisme des nids de boucles pour l'optimisation du temps d'exécution et de la taille du code. (Nested loop parallelism to optimize execution time and code size)
The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instructio...
متن کاملMaximizing Loop
Loop fusion is a program transformation that merges multiple loops into one. It is eeective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that ...
متن کاملMaximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Loop fusion is a program transformation that merges multiple loops into one. It is eeective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that ...
متن کاملFull Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming
Most scientific and DSP applications are recursive or iterative. Uniform nested loops can be modeled as multi-dimensional data flow graphs (DFGs). To achieve full parallelism of the loop body, i.e., all the computational nodes executed in parallel, substantially decreases the overall computation time. It is well known that for onedimensional DFGs retiming can not always achieve full parallelism...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- I. J. Comput. Appl.
دوره 10 شماره
صفحات -
تاریخ انتشار 2003